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Abstract 

We consider algorithms for load balancing on unreliable machines. The objective is to opti- 
mize the two criteria of minimizing the makespan and minimizing job reassignments in response 
to machine failures. We assume that the set of jobs is known in advance but that the pattern 
of machine failures is unpredictable. Motivated by the requirements of BGP routing, we con- 
sider path-independent algorithms, with the property that the job assignment is completely 
determined by the subset of available machines and not the previous history of the assignments. 
We examine first the question of performance measurement of path-independent load-balancing 
algorithms, giving the measure of makespan and the normalized measure of reassignments cost. 
We then describe two classes of algorithms for optimizing these measures against an oblivious 
adversary for identical machines. The first, based on independent random assignments, gives 
expected reassignment costs within a factor of 2 of optimal and gives a makespan within a factor 
of 0(logm/loglogm) of optimal with high probability, for unknown job sizes. The second, in 
which jobs are first grouped into bins and at most one bin is assigned to each machine, gives 
constant-factor ratios on both reassignment cost and makespan, for known job sizes. Several 
open problems are discussed. 
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1 Introduction 



Given a set of jobs J = {1 . . . n} and machines M = {1 . . . m}, where each job j has a size or 
processing time pj, the problem of load balancing is to construct an assignment / : J — > M 
that minimizes the makespan, defined as (7 max = maxj 6 j\/ J2jef~i-(i) Pj- 1 

We consider a variant of load balancing in which the set of available machines S changes over 
time. In effect, we are solving a sequence of P||C max scheduling problems where the set of jobs is 
fixed but the set of machines varies, and our goal is to minimize both the makespan C max and the 
reassignment cost of moving jobs from one machine to another as new machines become available 
and old machines leave. As a further complication, we restrict ourselves to path-independent 
algorithms, those that always assign the same jobs to the same machines given a particular set 
S despite any previous history of assignments. This restriction simplifies the description of an 
algorithm, since we can just present an assignment for each nonempty set of machines, but it may 
dramatically increase reassignment costs since we cannot take previous assignments into account. 
Surprisingly, we show that randomization can lower the expected reassignment cost between any 
two states of a path-independent algorithm to within a constant factor of optimal, while maintaining 
a constant approximation ratio to the optimal makespan. 

This variant of load-balancing is inspired by the problem of routing among multiple network 
paths using the Border Gateway Protocol (BGP) [10], the de facto interdomain routing protocol 
of the Internet. In BGP, the global Internet is divided into multiple autonomous systems (AS). 
Each AS has several peering ASes, and the ASes export to its peers available routes to destination 
prefixes (jobs). Each AS maintains a cache (called routing information base) of currently available 
routes to the destinations exported by its peers, and selects routes to destinations from the routing 
cache using a route selection algorithm. One major objective of the route selection algorithm 
is load balancing among multiple peering links (machines) [15]. Figure Q shows the standard 
protocol/process model of interdomain route selection of BGP [2, 13, 15]. 
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Figure 1: The protocol/process model of route selection for interdomain traffic engineering of BGP. 

Because network connections are dynamic, the set of routes that are available at any time may 
vary. When the set of available routes changes, BGP will re-run the route selection algorithm. If 
the assignment of the route to a destination is changed, BGP will update its peers, and this update 
can propagate throughout the global Internet. Thus, it is important to minimize the number of 
reassignments, since they are expensive operations both in terms of bandwidth and router CPU 
processing [14]. 

Naturally, some reassignments are unavoidable: when a peering link goes down, all destination 
prefixes that previously used that link must be assigned elsewhere, and when new links become 
available, we cannot refuse to reassign destinations to them without violating our load-balancing 

1 In the classic a|/3|7 notation of Graham et al. [4], we are considering primarily the P||C max scheduling problem. 
Our use of notation largely follows that of [7]. 
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criterion. So the number of reassignments performed by an algorithm will need to be measured 
against an estimate of how many reassignments are necessary, which will be obtained by considering 
the behavior of an ideal algorithm that maintains perfect load balance with minimal reassignment 
costs. 

A further complication is that routers (e.g., cisco and Juniper routers) implement route se- 
lection by assigning priorities (called local preference values) to routes. This means that the as- 
signments of destinations to peering links are unaffected by past history, a property known as 
path-independence. Thus, we will consider only load-balancing algorithms in which the cur- 
rent assignment is determined only by the current set of available machines. Furthermore, for 
multihomed stub ASes, such path-independent assignments can guarantee interdomain routing 
stability [15]. Path-independence is thus a desirable property of load-balancing algorithms for 
unreliable machines, motivating our study of such algorithms. 

Formally, a path-independent load-balancing algorithm is just a function from 2 M \ {0} to M J , 
where the argument represents the (always nonempty) set of available machines and the function 
value represents an assignment, with the constraint that no job is ever assigned to a machine that 
is not in the available set. Given such an algorithm A, we write A$ for the assignment chosen by 
A for available set S and A${j) for the machine to which job j is assigned. With this notation, the 
constraint on A is that As(j) G S for all S and j. 

Because the assignment of jobs to machines is determined by the set of available machines, this 
set completely determines the state of the system. We thus identify sets of available machines with 
states of the system and will refer to such sets simply as states hereafter. 

1.1 Our results 

The paper contains several novel contributions: 

• The new problem of path-independent load balancing, motivated by practical issues in 
BGP routing. 

• A measure of reassignment costs that takes into account the difference between small and 
large changes in the set of available machines. This measure is motivated and described in 
Section EJ We also show that standard competitive analysis techniques [12] are not useful. 

• A simple algorithm, based on independent random machine preferences, that achieves low 
reassignment costs and 0(logm/loglogm) • OPT makespan even with unknown job sizes. 
This algorithm, together with a more general class of preference-based algorithms of 
which it is a member, is described in Section [3 

• A more sophisticated algorithm BinHash for jobs of known sizes, based on consolidating jobs 
into a variable number of bins (depending on the number of available machines) and then 
hashing the bins. This algorithm achieves constant approximation ratios on both reassignment 
costs and makespan. It is described in Section 3] 

Finally, in Section |S] we discuss possibilities for future work. 

Because of space limitations, some proofs are sketched or moved to the appendix. Complete 
proofs can be found in the full paper. 
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1.2 Related work 



It has long been known that a simple greedy algorithm [3] achieves a makespan within a factor 
of 2 of optimal on identical machines. Much recent work on the problem has focused on on-line 
load-balancing, where jobs arrive one at a time; see [1] for an example of the current state of the 
art. Our work is distinguished from this work by the assumption that jobs are known, but that the 
set of available machines changes over time. 

Kalyanasundaram and Pruhs [5,6] have considered models of fault-tolerant scheduling for par- 
allel computers; here the issue is that process failure prevents any job assigned to it from being 
completed and the goal is to maximize the total value of completed jobs subject to release time and 
completion time constraints. Their results are based on redundant scheduling without pre-emption 
and are not directly applicable to our problem. 

The technique of consolidating jobs in our algorithm for known job sizes is similar to an approach 
taken by Sibeyn [11] for load-balancing jobs with sizes drawn from a random distribution. Sibeyn's 
techniques are intended to reduce the variance in job sizes and do not have the low-reassignment- 
cost properties of our prefix-based method. 

There has been substantial work on load-balancing mechanisms based on the power of two 
choices, where jobs pick two (or more) machines at random and choose the more lightly-loaded 
machine. (See [8] for a survey of these results.) We found through experiments that this approach, 
while producing excellent makespans, does not appear to yield low reassignment costs: the chains 
of displaced jobs that migrate to less loaded machines when a new machine becomes available are 
simply too long. 

2 Measuring performance 

We measure the performance of a path-independent load-balancing algorithm by two criteria: the 
makespan of As for each set of available machines S, and the reassignment cost paid by A 
when moving from one assignment 4j to a different assignment At- 

The makespan was is defined using standard notation as C™ ax (5) = maxjgs Y^j£A~ 1 (i) Pji ^ ne 
maximum total load on any one machine. 

The reassignment cost ta{S,T) is the number of jobs that move from one machine to another 
between As and At- Formally, we define ta(S,T) = \{j\As(j) / Note that sizes are not 

used in computing the reassignment cost; we assume that all jobs incur an equal overhead when 
reassigned regardless of size. Note also that reassignment cost is symmetric: va{S,T) = r^(T, 5) 
for all S and T. 

An execution of a path-independent load-balancing algorithm is specified as a sequence of 
states Sq, Si, S2, ■ ■ ■, which may or may not be finite. For a finite execution So ... St, the total 
reassignment cost of an algorithm A is 5Zi=o r A(Si, Si+i). For an infinite execution, the average 
reassignment cost is defined as lim t ^oo j X)*=o r A(si, Sj+i)- Note that because ta is bounded by 
n this limit always exists. 

Because both makespan and reassignment cost are parameterized by states, an algorithm's 
overall performance may depend strongly on the details of a specific execution. To be able to 
compare algorithms it will be useful to have summary statistics that describe the performance 
of an algorithm over all possible executions. Worst-case measures are of limited usefulness: the 
worst-case reassignment cost of any algorithm on two or more machines is n jobs per step, since 
we may have only a single available machine at each step that alternates between steps, forcing 
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us to always reassign jobs. Similarly, the worst-case makespan which arises when there is only 
one available machine does not distinguish algorithms in any way. So instead, we must consider 
measures that take into account the difficulty of particular executions. 

We will show first that a traditional competitive analysis approach does not suffice for this 
purpose, and then propose an alternative method where the cost of an execution is normalized 
by a straw-man reference cost corresponding to the performance of a particular idealized load- 
balancing algorithm. 

2.1 Impossibility of bicriteria-competitive algorithms 

The technique of competitive analysis [12] compares the cost of a candidate on-line algorithm 
(which receives the input only step-by-step) in a given history against the cost of an optimal off-line 
algorithm (which sees the entire input in advance and is typically not computationally bounded). 
A candidate algorithm is said to have competitive ratio c or be c-competitive if its cost is at 
most c times the cost of the optimal algorithm, plus a constant. 2 

For the path-independent load-balancing problem, we are measuring two quantities at each 
step: the makespan in the new state, and the reassignment cost between the old state and the 
new state. A natural way to apply competitive analysis to this situation might be to adopt a 
bicriteria approach and insist that our candidate algorithm be (a, /3)-competitive, where a bounds 
the ratio between the candidate's makespan in each state and the optimal algorithm's makespan 
in its corresponding state, and f3 bounds the ratio between the candidate's total reassignment cost 
over a finite execution and the optimal algorithm's total reassignment cost. 3 We show below that 
no such algorithm exists for any finite a and (5 in any system with n > 2 jobs and m > 2 machines, 
even if the jobs have identical sizes. 

Theorem 1 Let A be a path-independent load-balancing algorithm for a system with n > 2 jobs 
and m>2 machines. Then A is not (a, (5)- competitive for any a and (5. 

Proof: Without loss of generality, let m = 2 (we can always make any additional machines 
permanently unavailable). 

Let A be some candidate algorithm. We will show that A is not competitive with respect to 
reassignment costs regardless of makespan. Let i S {1,2} be such that some job is assigned to 
machine i in Ar 1)2 }; let i' be the other machine. Now consider an execution in which St = {1,2} 
when t is even and St = {i'} when t is odd. Because there is some job that is assigned to machine i 
at even times and not odd times, A pays a reassignment cost of at least 1 per step of the execution. 
However, an optimal A* that assigns all jobs to machine i' in all states pays 0. No finite (5 nor 
additive constant is large enough to overcome in the limit the infinite ratio between A and A*'s 
reassignment costs. I 

2 The constant allows excluding perverse effects of very short executions. 

3 The reason for using a worst-state ratio for makespan but a total ratio for reassignment costs is that the algorithm 
could spend an arbitrarily long time in a bad state, while reassignments costs accumulate more straightforwardly over 
transitions. We could also demand that the candidate's reassignment cost is at most (3 times the optimal algorithm's 
reassignment for each transition, but this only makes life harder for the candidate algorithm and excludes algorithms 
that depend on amortizing reassignment costs. 
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2.2 Normalized reassignment costs 

We define the normalized reassignment cost relative to the costs of an ideal algorithm that 
always maintains perfect balance. We consider first the case of n = km\ identical jobs, where k 
is some integer; the ml factor ensures that the jobs can be equally divided over any subset of the 
machines. In this case we can directly calculate the reassignment cost when moving from state S 
to state T. 



Theorem 2 Consider a system m machines and n = km\ identical jobs. Let A be an algorithm 
that assigns exactly n/\S\ jobs to each machine in each state S. Then the number of reassignments 
performed by A going from S to T is at least 

r*(S,r)=n-(l- J^r/L A (1) 



max (|S|, \ T\ 

Proof: Consider the set of jobs assigned to machines in S \ T by As; since A assigns exactly 
n/\S\ jobs to each machine in S\T, there are n ■ \S \ T\/\S\ such jobs total. All of these jobs must 
be reassigned going from S to T. 

Conversely any job assigned to a machine in T \ S by At must also have been reassigned going 
from S to T. Though these two sets of jobs may overlap, A must reassign at least 

\S\T\ \T\S\\ f\S\(SnT)\ \T\(SnT)\ 

n • max | ' , V , ,„, = n • max ' 



\S\ ' \T\ J V \S\ ' \T\ 

\sr\T\ . isnri 



n ■ max 
n • I 1 



\S\ 

isnri 



max (| £ |, \T\ 



We will take r* as the ideal reassignment cost, and measure the reassignment cost of any 
particular algorithm A as the maximum over all S and T of the ratio of its reassignment cost 
r A (S,T) to r*(S,T). 

A justification for this approach is that the r* lower bound continues to hold in expectation 
for any algorithm — regardless of whether it distributes jobs evenly or not — provided the machine 
names are randomly permuted before the algorithm is used and such renaming is undone afterwards. 
This fact is formally stated in the following theorem. The random permutation of machine names, 
which is easily implemented by an oblivious adversary, prevents an algorithm from getting lucky 
by placing all of its jobs on machines that stay available in both S and T. 



Theorem 3 For any algorithm A that maps states to assignments, choose a permutation p uni- 
formly at random. Let p -1 Ap be the algorithm that constructs an assignment for S by applying 
A to pS and then undoing the machine renaming. Then the algorithm p -1 Ap reassigns at least 
n ■ (l — m J^s\^x\) ) ^°^ s ^ n ex P ec t a ti° n when moving from state S to state T . 

The proof is given in Appendix lAl 
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3 Preference-based algorithms 

A preference-based algorithm is one in which each job is assigned a permutation of the machines 
(which may depend on the set of other jobs), and always moves to the first available machine in its 
permutation. We can think of the operation of a preference-based algorithm as choosing for each 
job j and state S the machine i in S that minimizes o~j(i), where Oj is the permutation for job j. 

We consider first the reassignment costs of preference-based algorithms in general and then the 
makespan of the simple preference-based algorithm where each aj is a random permutation. 

3.1 Reassignment costs 

Preference-based algorithms have the desirable property that they achieve close to the minimum 
reassignment cost against an oblivious adversary, provided the preferences are permuted randomly 
before the algorithm is used. This fact is stated formally in the following theorem. 

Theorem 4 For each job j, let a a be a permutation of the machines. Choose a permutation 
p uniformly at random. Then the preference-based algorithm using preferences o~jp reassigns an 
expected n ■ (l — pQfj) jobs when moving from state S to state T. 

Proof: Fix some particular job j. Going from S to T, job j stays put just in case minjg^ o~jp(i) = 
minjgy o~jp{i). This occurs precisely when mirn^suT o~jp{i) is achieved by some i in Sf]T, i.e. when 
neither S nor T provides a machine that j prefers to the best machine in their intersection. Since 
o-jp is a random permutation, the probability that j does not move is thus precisely |5nT|/|5UT|. 
The expected number of moves is obtained by taking one minus this probability and summing over 
all n jobs. I 



Corollary 5 For any preference-based algorithm A and any states S and T , ~E[ta(S, T)] < 2r*(S, T), 
where r* is defined as in Theorem^ 

Sketch of Proof: Follows from showing that E[r A (S,T)} = n-(l - < 2n-max (^21 iZVH 



2r*(S,T). I 



3.2 Makespan for random preferences 

A typical case of preference-based algorithms is the random preference algorithm, in which a fixed 
random permutation of machines is picked independently for each job as its preference list. 

This randomization implicitly subsumes the random permutation of machine names assumed 
in Corollary 13 thus the corollary applies and random preferences yield an expected 2r*(S,T) 
reassignments going from state S to T. For identical jobs, the makespan of random preference 
when all machines are available is G (log mj log log m) ■ OPT w.h.p. for n = m, and is (n/m) + 
Q (yJ{n/m)\og mj = OPT • (l + (y/{m/n) log mjj w.h.p. for large n, using standard balls- in- 
bins results (e.g. [9]). Reducing the number of available machines effectively replaces m with \S\ in 
the latter expression. 

If the jobs are not identical, we can still argue that the makespan is O (log mj log log m) ■ OPT, 
using a generalization to weighted balls of the usual balls-in-bins results. Here we assume that we 
have a bound on both the weight of the largest ball and the expected weight of the balls assigned 
to any one bin. These quantities are both bounded by the optimal makespan. 
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Lemma 6 Let n balls with non-negative weights w\, W2, ■ ■ ■ w n be distributed independently and 
uniformly at random into b < m bins. Let W = max ^max(wj), ^ J2i=l w i) • Let X be the maximum 
over all bins of the total weight of the balls in that bin. Then for any fixed c, there exists a constant 
k such that X < kW lgm/lglgm with probability 1 — o(m~ c ). 

The proof, a straightforward application of characteristic functions, is given in Appendix IE1 
Applying Lemma EJ to the random-preference algorithm yields: 

Theorem 7 The random-preference algorithm achieves a makespan of 0(log m/ log log m) ■ OPT 
with high probability for any fixed set S of available machines. 

Proof: Apply Lemma El with b = \S\ and W = OPT. I 

Note that the probability that Theorem [7| fails is polynomial in m, while the number of possible 
subsets of available machines is exponential. So it is possible that the makespan for some particular 
subset S is much worse. This is not a problem if we assume an oblivious adversary, but may become 
one if a more powerful adversary can choose S after determining the algorithm's preference lists — 
which it can do by observing the algorithm's behavior. 

4 Algorithm based on binning and hashing 

In this section, we introduce an algorithm called BinHash which achieves constant approximation 
ratios for both makespan and reassignment costs. Unlike the random-preference algorithm of Sec- 
tion ^. 21 the makespan bound is deterministic and holds in all states. (The reassignment cost bound 
is still probabilistic.) 

The algorithm is based on the observation that if the number of jobs n < a\S\ for some 
constant load factor a, we can get low reassignment costs and (trivially) optimal makespan by 
assigning each job to the first empty machine on its preference list. This process — the hashing 
step — is structurally equivalent to hashing with open addressing, and the number of reassignments 
caused by adding or removing a job is bounded by the length of chains in the corresponding hashing 
algorithm, which is a constant for fixed a. 

However, since we cannot assume n < ct\S\ in general, we add an initial binning step where 
jobs are assigned to max(l, L a l<S1J) bins, which are then hashed to machines. The binning step 
sorts jobs by size, and then assigns each job to the bin whose index, expressed in binary, is the 
longest available suffix of the index of the job. This is a form of round-robin assignment that, by 
spreading the jobs roughly uniformly among the bins, guarantees a constant-factor approximation 
of the optimal makespan. At the same time, the number of jobs that move when the number of bins 
changes is small, since in addition to guaranteeing an even spread by size the binning procedure 
also guarantees an even spread by job count, and adding or deleting a bin only splits some existing 
bin or combines two previous bins. 

We now give a formal definition of the algorithm. Given n jobs with sizes po > p\ > • • • > p n -i, 
and a set S G 2^ /{0} of available machines, BinHash computes an assignment of n jobs to the 
machines in S in two stages: 

1. Binning stage: Let b = max{ |_ck| S'lJ , 1}, where a € (0,1) is the load factor parameter of 
the algorithm. Assign the n jobs to b bins by calling the function B({po,pi, . . . ,p n _i}, b) \— > 
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{B^\b^\ . . . , -B^-i)' w here B^ is computed by 



{j = i + k ■ 2 riog(i+1) l \ k>0A0<j<n-l} 
1 



5< 6) «- 4- U S (6) 



i=i+i 

for i = 6-l,6-2, ...,0. 

In other words, each bin gets precisely those jobs whose binary expansions include the 
binary expansion of i as a suffix, provided there are no higher-numbered bins that capture 
them first. It is immediate from the definition of that every jobs is assigned to exactly 
one bin. 

2. Hashing stage: The bins are now assigned to the machines in S in order by a uniform hashing 
function TC(i,S), where for each i from to [am\ , bin i is hashed to the first machine in a fixed 
random permutation of all m machines that is in S and not occupied by a lower-numbered 
bin. 

We now show that the BinHash algorithm achieves low makespan. This follows from the even 
distribution of the sizes of the bins and the fact that at most one bin is assigned to each machine. 

Lemma 8 Assuming that the optimal makespan of assigning n jobs to \S\ machines is OPT, then 

i r( & )i ^ 4n 
max Lfcs> < 



and 



for b = max{ [a|5|J , 1}. 



max 

0<i<b- . 



o<i<b-i % a\S\ 



x E P j <(l + 2/a)-OPT 



The proof is given in Appendix IO 

To show low reassignment costs, we must take into account both reassignments caused by 
moving bins (when a machine leaves or becomes available) and reassignments caused by moving 
jobs between bins. The former is bounded by the fact that the number of jobs in each bin is 
roughly equal. The latter requires an analysis of the hashing step. To simplify the argument, we 
consider first only the case where a single machine becomes unavailable. The case of a new machine 
becoming available has the same cost by symmetry, and we will show later in Theorem ^3 that the 
cost of larger transformations can be expressed in terms of these two cases. 

Lemma 9 For T C S, and \T\ = \S\ — 1, the algorithm BinHash reassigns at most an expected 
a(i~a)\s\ j°b s w ^ en moving from state S to state T. 

Proof: The total reassignments can be upper bounded by the sum of reassignments caused 
by binning and the reassignments caused by bin displacements due to hashing. 

According to the definition of binning stage in BinHash, it is easy to verify that for any b > 1, 

R (6) _ R (&+1) i | R (6+i) 

r> f ,_ 2 Liogi>J — - D f ) _ 2 Liogf'J u D b 

B f) = o(6+D for ^-2^. 
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Thus the reduction of the last bin is the only case for the reassignments caused by binning process. 

For the single machine in S\T, there might be a bin % assigned to it in state S. When the state 
changes from S to T, bin i will be assigned to some other machine in T by the hashing step. This 
may lead to a recursive displacement of further bins with larger index than i. Since the placement 
of bin i in T is uncorrelated with the preferences of later bins, the number of such displacements 
can be bounded by the maximum number of bin displacements caused by inserting an additional 
bin with an random index into state T's assignment. 

Suppose that n < m, and let A(n, m, i) denote the expected displacements led by inserting 
an object with index i > to a hash table with m slots and n objects by uniform hashing. It is 
obvious that all objects with priority h < i will not move, therefore it is equivalent to assume that 
all such objects and their slots are not available to our analysis, thus with probability object 
% hits a slot which is occupied by object i' where i! > i, and then triggers an expected A(n,m,i') 
displacements led by inserting object i! . 

1 i = n 

A(n,m,i) = { l + ^A(n,m,i') 

for some i' > i o.w.. 

By induction, it is easy to show that A(n,m,i) < 1/(1 — — ) for all i > 0. 

Applying this to the bin assignment, we have at most A(b — 1, l^l — l,i) displacements of bins, 
where b = max{ [al^U , 1}, therefore, 

A(6-1,|S|-M)< 



1 — a 

The total reassignment costs in terms of jobs is thus bounded from above by (1+1/(1— a)) maxo<j<;,_i \ < 

4(2-a)n ■ 
a(l-a)\S\ ■ 9 

We now combine the results in Lemmas |S1 and El to obtain the full result: 

Theorem 10 The following claims hold for any constant < a < 1: 

For any n > and state S, assuming that the optimal makespan of assigning n jobs to \S\ 
machines is OPT, then the makespan of the assignment obtained by running BinHash with the n 
jobs on S is within (1 + 2/ a) ■ OPT. 

For any states S and T, the expected number of reassignments performed by BinHash going 
from S to T, 

E[r B (S, T)) < 2 ( 1 + 4 f~ a \ ) ■ r*(S, T), 
V a(l — a) J 

where the expectation is taken over the randomness of uniform hashing, and r* is defined as in 
Theorem 

Proof: Since only one bin is assigned to each machine, the upper bound on makespan follows 
directly from Lemma |HJ 

For the state transition from S to T, where T C S, according to Lemma El we have that 

r ,„ ™m f 4(2 - a)n ( 1 1 1 
E[r B (S,T)] < mm n,^- + - + ...+ 



a(l-a) \\S\ \S\-1 \T\ + 1 
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. / 4(2 |5| 

< mm < n, — ; — In , — - 

L a(l — a) \T\ 

For a general S 1 and T, we have that 

E[r B (S,T)] < E[rB(S,SnT)]+E[r B (T,snT)} 

< fl+ 4 f~ a \ ) (r*(S,SnT) + r*{T,SnT)) 
V a(l — a) J 

< 2(l + l^)). r .( S ,T). 

V a(l-a)/ 

I 

It is not hard to show that the coefficient on reassignment costs is minimized at a = 2 — \/2 ~ 
0.59 .... Here the reassignment costs are bounded by Theorem^ at approximately 48.6 • r* and the 
makespan at (3 + y/2) ■ OPT » 4.142 • OPT. There are a number of loose inequalities in the proof 
of Theorem |2 and we believe that a more careful analysis would show that the correct minimum 
coefficient on reassignment costs is closer to 12 in most cases. 

It is also worth noting that the makespan coefficient 1 + 2/ a can be reduced somewhat by 
increasing a. However, this dramatically increases the reassignment costs as a approaches 1, and 
the makespan bound never drops below 3 • OPT, which is not much better than the bound for 
a = 2 — \pl. However, it is not out of the question that a more sophisticated algorithm could 
achieve higher utilization of the available machines without blowing up the reassignment costs. 



5 Conclusions and future work 

We have described a new problem of path-independent load balancing for unreliable machines, 
where the goal is to minimize makespan while simultaneously minimizing the cost of reassigning 
jobs from one machine to another subject to the constraint that assignments cannot depend on the 
previous history. We have also obtained some initial results showing that it is possible to achieve 
constant approximation ratios to both the optimal makespan and optimal reassignment costs. 

However, much work still needs to be done. The proven constant approximation ratios for the 
BinHash algorithm — particularly for makespan — are still quite high, and it would be useful to have 
an algorithm with better constants. 

The assumption of identical machines is a strong one. It is not clear whether our results can 
be generalized to the case of uniform machines (where different machines have different capacities) 
or to the even more general case of nonuniform machines (where different jobs may have different 
effective sizes on different machines). This last case may be particularly important in interdomain 
routing, as particular flows may be forbidden from traveling over certain pipes by contractual 
requirements or security concerns. 

Finally, we have made very generous assumptions about the nature of the jobs and the nature of 
the adversary. It would be interesting to determine whether it is possible to solve path-independent 
load balancing with jobs that vary over time or with a more powerful adversary that can observe 
and respond to the algorithm's actions. 
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A Proof of Theorem [3] 

Fix some particular job j. Let As(j) be the machine to which A assigns job j in state S. Note 
that Ag{j) 6 S. Observe that: 

• For any (U, V) where |Z7| = \S\, \V\ = \T\ and \U P\V\ = |5nT|, there exists some permutation 
p such that (U, V) = (pS,pT). 

• There are (i™,) (|THSnT|) equivalence classes after applying permutations to (5, T), each 
of which contains |5nT|!(|5| - |SnT|)!(|T| - \SnT\)l(m- \SUT\)\ permutations. We denote 
this value as e(S,T). 

• Fix S and T. For any pS, the number of pT which contain A p s(j), i.e., \{pT \ A p s{j) 6 pT}\ 
equals (, ( 

\T\-\SnT\J- 

Then the probability that job j stays while the state going from S to T 

Prp[A pS (j) = A pT {j)\ = —Y,\iP\ A psU) = A pT (j)}\ 

m. p 

= -^J2\{P T \ A psU)=A pT U)}\-e(S,T) 
m - P s 



^ ^J2\iP T \ A ps(j)£pT)}\-e(S,T) 



ml 

pS 



m\jg\\SnT\-l) \\T\-\SnT\) " e ^'^ 

ISnrl 



Symmetrically, 



\S\ 



Prp[A pS (j)=A pT (j)} < 



Therefore, we have Pr p[A p s(j) = A p T{j)\ < ma I^ | g['|r | ) • The probability that job j is reassigned 

from S to T is thus lower bounded by (1 — ;^7TgTTprj )- The expected total cost is obtained by 
summing over all n jobs. 



B Proof of Lemma 

The proof is a straightforward generalization of the usual argument using characteristic functions 
for for n = m identical balls. 

Let p = 1/6, q = 1 — p, and a > 0. Let Xj be a random variable representing the weight that 
the i-ih ball contributes to some particular bin. Let S = J2i Xi- Then 
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E 



E 



]Je aX 

. i 

= ~[[(qe + pe' 

i 
i 
i 



Now apply Markov's inequality to get 



Pr[5>t] = Pr 



< exp (^P^e ™^ - exp(at) 



exp p 



at 



We now choose the Wi to maximize this quantity subject to the given constraints Wi < W and 
Y^i w i — bW. Observe that this is equivalent to maximizing ^ ji e aWi . Since e aWi is convex, this is 
maximized subject to the sum constraint by setting Wi = W for i = 1 . . . b and W{ = elsewhere. 
We thus have 

Pr [S>t] < exp e° Wl ~ 

(pbe 

(< 

It is not hard to show that the best choice for a is ln(i/W)/W, giving 



exp 



K aW - at 



exp I e aW — at 



Pr[S>t] < exp (e ln(i/M/) - (t/W)ln(t/W) 
= exp {{t/W) - (t/W) ln(t/W)) . 
= exp((t/W)(l-ln(t/W)). 

Finally, set t/W = felnm/lnlnm to get 



(k llT TY1> 
- — (1 — In k — In In m + In In In m) 
In mm 

-T h ( 1~ ^ n In In In m \ \ 

= m\ Vlnlnm ' In In m )i 
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So the probability that any of the b bins exceeds kW Inm/lnlnm is at most bm k ( 1 °W) < 
m i-fe(i-o(i))^ which is o(m~ c ) for sufficiently large k. 



C Proof of Lemma [H] 

By noting that Bf 
easy to verify that, 



By noting that contains all such j whose longest available suffix in binary expansion is i, it is 



Bf ] <^{j = i + k- 2^°^ | k > A < j < n - 1}. (2) 
Therefore, for any < i < b — 1 

, n , . 2n . An 



B i I ^ LnlWhl J — ~T — 



2 LiogbJJ ~ b ~ a\S\' 

As for the loads of bins, note that OPT > m&x.{po, Sj=o Pj}- According to © and the non- 
increasing order of pj, we have that, for each < i < b — 1, 



J2 Pj ^ J2p 



i+fc-2Li°g bJ 



= ^ + E^i¥- 2LlosbJ -^. 2 ^^ 

fe>l 

fc>l t=0 
2 n—1 

^ p° + ^ib\Y,pj 

j=0 

2 n ~ ^ 

^ po + tY^Pj 

2 



< (1 + 2/ a) • OPT. 
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